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CHAPTER   I 
INTRODUCTION 


Optimization  can  be  defined  as  the  task  of  finding  the  minimum  or  maximum 
vahie  of  any  function.  There  are  different  types  of  problems  in  this  field  of 
optimization,  and  so  there  also  are  different  techniques  of  optimization.  This  thesis 
deals  with  a  stochastic  optimization  technique  for  optimization  of  functions  which 
depend  on  many  independent  parameters. 

Optimization  problems  can  be  "constrained"  or  "unconstrained" .  In  the  first 
case,  the  function  is  to  be  optimized  by  taking  care  of  the  fact  that  it  must  always 
satisfy  a  set  of  predefined  conditions  or  "constraints".  For  eaxmple,  in  this  thesis, 
we  minimize  a  fimction  with  the  constraint  that  the  independent  variables  can  never 
be  negative.  This  is,  therefore,  a  constrained  optimization  problem.  On  the  other 
hand,  if  the  fimction  is  free  from  any  constraints,  then  we  have  the  unconstrained 
optimization  problem. 

The  complexity  of  an  optimization  task  depends  on  the  complexity  of  the 
function  to  be  optimized,  and  also  on  the  constraints  involved.  In  fact,  the 
complexity  is  directly  related  to  the  number  of  independent  variables  governing 
the  function,  or  to  say  technically,  on  the  degrees  of  freedom  of  the  function. 
The  higher  this  degree  of  freedom,  the  more  complex  the  optimization  problem 
is.  Technically,  the  function  that  is  to  be  minimized  or  maximized  is  called  the 
"cost  function"  or  the  "objective  function".   In  practice,  it  is  not  very  uncommon 
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to  come  across  situations  where  the  optimization  process  involves  hundreds  of 
independent  variables,  restilting  in  a  very  high  dimensional  problem,  and  in  these 
cases,  the  conventional  deterministic  approaches  do  not  work  very  efficiently,  or 
sometimes,  are  practically  inapplicable  because  of  the  constraints  in  time  eind 
machine  comput ability.  Even  the  familiar  technique  of  matrix  inversion  can 
be  computationally  very  cimtibersome  for  large  matrices.  Besides,  this  familiar 
technique  of  matrix  inversion  may  fail  in  those  situations  where  the  matrices 
are  singular  and  hence  non-invertible.  In  these  cases,  stochastic  optimization 
approaches  produces  a  higher  probability  of  finding  the  optimized  state  with  higher 
computational  efficiency.  The  stochastic  optimization  approach  studied  in  this  thesis 
is  that  of  "Simtdated  Armealing"  which  will  be  discussed  later  in  this  thesis.  This 
optimization  technique  is  so  ceilled  because  of  the  close  similarity  of  the  concept 
behind  it  to  the  process  of  metallurgical  annealing. 

In  this  work,  the  method  of  "simulated  annealing"  has  been  applied  for  solving  a 
least  squares  optimization  with  a  particular  appUcation  to  the  inverse  problem.  The 
goal  here  is  to  study  the  effects  of  several  parameters  (discussed  in  later  chapters) 
on  a  particular  simulated  aimeeiling  algorithm  developed  here,  and  we  do  not  intend 
to  find  the  best  possible  optimization  algorithm.  In  fact,  the  results  obtained  by  the 
current  algorithm  are  quite  poor.  However,  they  do  show  the  effects  of  the  diff"erent 
parameters  on  the  algorithm. 

We  give  here  a  map  of  the  contents  of  this  thesis.  Chapter  II 
explains  the  inverse  problem  that  is  dealt  with  in  this  thesis.  Chapter  III 
explains  stochastic  optimization  techniques,  and  compares  them  to  deterministic 

4 


optimization  techniques.  Chapter  IV  explains  the  application  of  simulated 
annealing  to  least  squares  optimization  with  particular  application  to  the  inverse 
problem.  Fineilly  Chapter  V  presents  the  results  and  application. 


CHAPTER    II 
THE    INVERSE    PROBLEM 


In  this  chapter  we  will  give  a  general  description  of  a  class  of  inverse  problems, 
along  with  the  particvdar  inverse  problem  that  is  of  interest  in  this  thesis. 

An  important  type  of  problem  arises  when  we  know  a  function  g{a,  b),  a  ^  R, 
b  £  R,  (where  R  refers  to  the  real  space),  with 

g{a,b)=  f  K{a,b,ij)f{w)du;  (2.1) 

where  /if  is  a  known  function  called  the  'Kernel',  /  is  an  unknown  function,  and 
w  is  a  variable  of  integration  such  that  w  G  fi,  the  domain  of  integration,  and 
the  task  is  to  construct  the  function  /,  the  unknown  function  [1].  Prom  the 
computational  point  of  view,  the  task  is  to  find  {/(u'i)}jLi,  fc  G  TV,  for  a  set  of 
values  of  Wj  G  fi,  given  a  set  of  values  oi  g  (i.e.  the  set  {gi,g2,  ■•■■■, gk})  while  K 
is  a  known  function.  Here,  gi  means  g{ai,bi)  where  Oj  and  bi  are  some  values  of 
a,  and  b.  This  class  of  problems  is  very  commonly  encountered  in  many  fields, 
for  example  in  analyzing  remote  sensing  data,  reconstructing  three  dimensional 
images  from  a  set  of  two  dimensional  perspectives,  etc. 

There  are  various  methods  that  can  be  used  to  solve  this  inverse  problem. 
One  may  apply  the  method  of  matrix  inversion  [2]  by  converting  this  continuous 
problem  into  a  discrete  one,  with  the  assumption  that  the  function  /(u;,)  remains 
fairly  constant  in  the  neighborhood  of  the  coordinates  {loj}]'^^.  The  discretization 
is  implemented  by  segmenting  the  continuous  domain  fi  of  the  fimction  /  into  k 
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discrete  regions  over  which  /  is  asstuned  to  remain  constant.  In  this  case,  we  can 
write  down  k  equations  corresponding  to  the  set  of  fc  values  of  {gi,g2, ,9k}  as 


9i 


=  K'{aubuu;i)f{u>i)  +  K'{ai,h,iV2)f{t02)  + +  K'{ai,buU)k)fM 


92  =  K'{a2,b2,u)i)f{u;i)  +  K'{a2,b2,U2)fM  + +  K'{ai,bi,u>k)f{u^k) 


9k  =  K'{ak,bk,<j>)i)f{u)i)  +  K'{ak,bk,u;2)fM  + +  if'(afc,6fc,Wfe)/K) 

Here,  K'{ai,bi,u)j)  is  the  value  of  the  Kernel  over  a  neighborhood  of  Wj  computed 
through  the  relation 


K'{ai,bi,u;j)^   I    K{ai,bi,Lu)(L; 


where  Q,j  is  the  neighborhood  of  Wj.  The  above  set  of  equations  can  be  rewritten 
in  matrix  notation  as 

G  =  KF  (2.2) 

where  G  is  the  colmnn  matrix 


K  is  the  coefficient  matrix  whose  (ij)th  element  is  given  by  {fcjj}  =  K'(ai,  bi,u)j), 
and  F  is  the  matrix  given  by 

F  = 

V/K)/ 

So,  it  is  easy  now  to  see  that  the  inverse  probleiii  boils  down  tu  solving  the 
above  matrix  equation  to  get  F,  and  this  can  be  achieved  through  the  relation 
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F  =  K~^G.  Now  it  is  worth  noting  that  during  the  manipulations  of  the  equations 
so  far,  it  has  been  assumed  that  the  number  of  data  points  gi  is  at  least  equal  to 
the  number  of  segments  of  fl  over  which  we  intend  to  find  the  values  of  the  fvmction 
/.  Hence.this  method  fails  to  produce  amy  unique  solution  if  we  have  insufficient 
number  of  data  points  in  the  set  gi.  On  the  other  hand,  if  we  have  more  than 
k  data  points,  there  are  different  ways  of  handling  the  situation.  One  of  them  is 
to  consider  k  data  out  of  the  whole  set  such  that  these  are  the  least  correlated 
among  themselves,  so  that  we  get  the  maximmn  information  out  of  them,  and 
then  proceed  along  the  matrix  inversion  technique  described  above.  Even  if  we 
have  sufficient  data,  it  is  not  very  imcommon  in  practice  to  find  situations  where 
the  inversion  of  the  matrix  K  happens  to  be  an  unstable  process  because  of  the 
presence  of  very  small  entries  in  the  matrix,  resulting  in  the  inapplicability  of 
this  method  of  solving  for  F  by  coefficient-matrix  inversion  [2].  This  method, 
therefore,  has  a  very  restricted  use  since  its  applicability  depends  on  a  very 
stringent  set  of  conditions  which  are  quite  often  not  satisfied  in  practical  problems. 

To  avoid  these  difficulties,  the  inverse  problem  can  be  solved  as  an 
optimization  problem,  where  we  try  to  approximate  the  values  of  /  over  the 
segmented  domain  fi,  as  described  above,  and  the  goodness  of  the  approximation 
is  measured  through  a  "cost  function"  C  defined  by 

k 

C  =  J2{gi-gif  (2.3) 

1=1 

where  gi  are  the  actual  data  and  gi  are  the  values  of  the  fimction  g  corresponding 
to  the  approximate  values  of  /(u;,).  It  is  clear  that  the  motivation  behind  this 
technique  is  to  find  the  set  of  approximate  values  {/'{iHi)}  corresponding  to  the 
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minimum  value  of  the  "cost  function"  C  which  is  the  sum  of  the  squared  error  for 
each  set  of  locations  of  the  transmitter  and  the  receiver.  This  is,  therefore,  a  least 
squares  minimization  problem,  and  hence  can  be  solved  by  different  minimization 
or  optimization  techniques.  The  conventional  minimization  teclmiques,  such  as 
the  gradient  search  method,  work  well  if  the  cost  function  is  purely  convex,  and 
it  does  find  the  global  minimmn  in  a  computationally  efficient  manner.  But 
this  does  not  perform  so  well  in  cases  where  the  cost  fimction  is  not  imiformly 
convex,  and  has  local  minima.  In  this  case,  it  can  easily  produce  a  local  minimum 
of  the  cost  function,  and  may  not  reach  the  global  minimum.  Besides,  &\1  exact 
methods  for  optimization  require  exponential  increase  in  computing  effort  with  the 
number  of  dimensions  N  [3],  so  that  in  practice,  exact  solutions  can  be  attempted 
only  on  problems  involving  a  few  hundred  independent  parameters.  A  different 
optimization  technique  is  therefore  required  to  treat  this  difficulty,  and  simulated 
annealing,  which  is  described  in  the  next  chapter,  is  one  such  remedy. 

We  wiU  now  describe  the  partictilar  inverse  problem  that  has  been  used  in 
tliis  thesis.  Physically,  this  problem  can  be  described  as  finding  the  index  of 
refraction  of  the  different  portions  of  the  Earth's  structure  below  its  surface, 
from  data  tziken  only  on  the  surface.  This  problem  was  solved  by  Professor  A. 
Reunm  at  the  Department  of  Mathematics  at  Kansas  State  University,  and  a 
treatment  on  the  recovery  of  underground  layers  from  data  taken  only  on  the 
surface  can  be  found  in  [4].  The  inverse  problem  dealt  with  in  this  thesis  is  a 
modification  of  the  problem  in  [5],  and  here  the  objective  is  to  recover  the  index 
of  refraction  as  a  function  of  a-  and  y  as  well  as  r,  and  hence  the  current  problem 
is  an  optimization  with  more  degrees  of  freedom  than  the  problem  in  [4] .  As  will 
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be  shown  here,  mathematically  this  problem  is  equivalent  to  an  inverse  problem 
which  can  often  result  in  a  singular  coefficient-matrix,  as  described  above,  and 
hence  demands  special  attention  as  far  as  the  methods  of  solving  this  problem 
are  concerned.  The  problem  of  detecting  geophysical  strata  from  surface  data  is 
of  much  practical  importance  in  fields  like  oil  exploration,  and  geophysical  fault 
detection  for  the  prediction  of  earthquakes,  etc.  For  this  purpose,  seismic  signal 
generators  are  used  which  can  generate  signals  that  can  travel  through  the  Earth's 
crust,  and  are  reflected  by  the  different  layers  of  the  undergroimd  strata  to  appear 
back  on  the  surface  of  the  Earth  with  some  modifications,  which  are  intimately 
related  to  the  underground  structure.   These  reflected  signals  are  the  sources  of 
information  about  the  structure  of  the  strata  below.  Mathematically,  the  reflected 
signal  properties  are  functions  of  the  index  of  refraction  of  different  regions  of 
the  underground  strata,  and  also  of  the  locations  of  the  signal  transmitter  and 
receiver.  A  fxmction  that  can  be  determined  from  the  reflected  signal  amplitude 
is  g{a,  b)  given  by  [5] 


j{a,b)=  f 


w{z) 


z  -  a 


-b 


.dz  (2.4) 


where  w{z)  is  the  difference  of  the  index  of  refraction  at  the  point  z  G  O  from  an 
assumed  constant  background  value,  and  a  e  A,  b  £  B  where  A  and  B  are  sets 

A  =  {a  :  a  is    a   location   of   the    transmitter}  (2.4a) 

B  =  {b  :b   is    a   location   of    the    receiver)  (2.46) 

In  practice,  the  function  in  the  above  equation  that  is  determined  from  experiment 
is  g{a,b)  and  the  fimction  to  be  computed  is  u'(r)  for  different  values  of  (z).  This 
is  an  inverse  problem  when  looked  at  from  a  mathematical  point  of  view,  because 
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when  compared  with  the  equation  (2.1)  above,  it  is  seen  that  w{z)  corresponds 
to  /(u>)  and  \j_J.\t-b\  corresponds  to  K{cj). 

■       /       /       /       / 


t 


/ 
/ 


i   unit 
2  units. 


Fig  -  1 

In  the  present  problem,  a  three  dimensional  block  of  Earth's  crust  is 
considered,  (see  figure  above)  and  its  dimensions  are  chosen  to  be  of  1,  10,  and 
10  imits  in  the  x,  y,  and  z  directions  respectively.  The  x,  and  the  y  axes  are 
aligned  along  the  surface  of  the  Earth,  and  the  z  axis  is  perpendicular  to  the 
surface  pointing  directly  inwards.  This  is  the  region  fi  for  our  integration,  fi 
is  then  segmented  into  25  boxes,  each  having  a  volume  of  4  cubic  units.  The 
segmented  fi  is  shown  above.  Once  segmented,  it  is  assmned  that  the  index 
of  refraction  remains  constant  over  each  box,  and  the  integral  in  equation  (2.4) 
above  is  computed  over  each  box.  Therefore,  the  equation  for  ff(a,  b)  can  then  be 
transformed  into  the  following  form: 

25 


^K^)  =  E-^I,j,_«M.-6r^^ 


(2.5) 


where  6ox,-  refers  to  the  segmented  region  corresponding  to  the  ith  box  of  fi,  and 
Wi  refers  to  the  constant  value  of  the  index  of  refraction  over  the  hoti.  Therefore, 
we  see  that  once  the  integrals  over  each  of  the  boxes  are  computed,  it  is  possible 
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to  compute  the  values  of  g{a,b)  for  any  set  of  {wi}  by  simply  multiplying  the 
integrated  vjilue  by  the  corresponding  Wi,  called  weight  hereafter,  and  adding 
these  products  over  all  the  boxes.  The  objective  then  is  to  treat  this  inverse 
problem  as  a  least  squares  optimization  problem,  and  hence  minimize  the  cost 
function 

C=         Yl        i9ia,b)-9{a,b))'  (2.6) 

(a,b)6(AxB) 

where  A  x  B  is  the  Cartesian  product  of  the  two  sets  A  eind  B. 

This  is  briefly  the  description  of  the  inverse  problem  dealt  with  in  this  thesis, 
and  the  application  of  simulated  annealing  in  solving  it  will  be  discussed  in  chapter 
4. 
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CHAPTER  III 

STOCHASTIC  OPTIMIZATION 
AND  SIMULATED  ANNEALING 


Stochastic  optimization  refers  to  the  process  of  optimization  of  a  set  of 
parameters  for  a  system  through  random  guesses  of  the  peirameter  values,  and 
choosing  the  case  which  produces  the  best  result  out  of  all  the  different  guesses 
made.  This  is  the  crude  form  of  stochastic  optimization,  as  here,  the  different 
cases  bear  no  correlation  to  one  another,  resulting  in  independent  sets  of 
random  parameter  values  for  the  different  cases,  and  hence  there  is  no  consistent 
improvement  as  more  and  more  trials  are  implemented.  This  is,  therefore,  an 
extremely  inefficient,  but  conceptually  very  simple  way  of  attempting  to  achieve  an 
optimized  set  of  system  parameters.  To  illustrate  this  process,  let  us  assume  that 

the  system  to  be  optimized  is  dependent  on  a  set  of  'n'  parameters  si,S2, ,Sn 

which  take  on  values  in  a  space  fi  C  iE"  or  C",  where  R  is  the  real  space  and  C 

is  the  complex  space.  Let  us  define  a  reell  valued  system  function  F{si ,  33 , ,  5„) 

that  depends  on  these  'n'  system  parameters.    The  objective  is  to  find  the  set 

of  optimum  parameter  values  s-  (for  i  =  1  to  n),  in  fl  such  that  F{sl,S2, ,s^) 

takes  the  minimum  value.  In  the  case  of  the  crude  stochastic  optimization  described 
above,  the  task  during  each  trial  would  be  to  choose  a  random  set  of  the  .s,  values 

such  that  (51,  52, ,5„)  e  n,  and  then  compute  the  corresponding  value  of  the 

function  F.  The  choices  of  these  n-tuples  are  independent  in  each  trial  and  are 
purely  random.  This  is  the  concept  behind  the  method  of  the  basic  stochastic 
optimization. 

14 


An  improvement  over  the  crude  and  simple  stochastic  optimization  process 
described  above  is  to  have  a  stochastic  procedure  where  the  different  trials  are  not 
independent,  but  every  trial  is  based  on  the  state  of  the  system  produced  by  the 

immediately  preceding  trial,  resulting  in  a  set  of  system  parameters  (si ,  ^2, ,  3„), 

which  move  consistently  towards  an  optimized  state  (ajjjj, i^n)-  Here  the  set 

of  parameters  during  the  first  trial  are  chosen  randomly  from  Q  just  as  in  the  method 
of  the  first  paragraph,  but  the  parameters  for  the  subsequent  trials  are  obtained 
by  changing  the  previous  set  of  parameters  by  small  randomly  chosen  amounts, 
such  that  the  maximiun  absolute  value  of  (s""*"^  -  a")  is  bounded  above  by  a  real 
number  s  called  the  step  size.  The  maximum  is  taken  over  all  possible  i  where 
3?"''^  and  a"  denote  the  values  of  the  ith  component  of  the  system-parameter  vector 

(si,S2, ,Sn)  during  two  successive  trials  with  the  higher  valued  superscript 

representing  the  later  trial.  Thus  we  find  that  the  parameters  during  the  different 
trials  are  correlated.  Here,  the  system  function  F  is  computed  after  every  update 
of  the  n-dimensional  parameter  vector,  and  the  updated  parameters  are  accepted 

as  the  new  improved  parameters  if  the  new  value  of  F{si,S2, ,3„)  is  less  than 

or  equal  to  its  previous  value.  Otherwise,  the  new  parameter  vector  is  rejected, 
and  the  old  parameter  vector  is  retained.  The  process  is  repeated  over  and  over 
again,  and  thus  we  get  consistently  improved  parameter  vectors  during  the  tricds. 
This  is  a  meirked  difference  of  this  algorithm  with  the  one  described  in  the  first 
paragraph,  as  the  trials  there  were  independent,  and  hence  there  was  no  consistent 
improvement.  But  this  improved  version  also  has  a  problem  associated  with  it.  As 
mentioned  above,  this  algorithm  accepts  only  those  parameter  vectors  that  lower 
the  system  function,  and  rejects  all  those  updates  that  increase  it.  As  a  result,  once 
the  parameters  are  in  the  neighborhood  of  any  local  mininiuni  of  F,  tli^re  is  a  good 
chance  that  F  will  get  trapped  in  it  when  the  step  size  chosen  is  small,  because  in 
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that  case  almost  every  update  of  the  parameter  vector  will  take  the  system  function 
into  a  region  neeir  the  local  minimum,  and  hence  all  those  function  veilues  would 
be  higher  than  the  existing  value,  thus  all  those  updates  would  be  rejected.  Tliis 
will  result  in  a  solution  that  will  produce  a  local  minimum  and  not  the  globally 
optimized  state.  One  coidd  take  a  larger  step  size  to  come  out  of  a  local  mininumi, 
but  in  that  case  the  system  will  be  changed  considerably  in  each  step,  and  hence 
it  wiU  have  a  lower  chance  of  finding  a  mininuun.  To  improve  upon  this  problem, 
we  can  modify  the  scheme  of  accepting  the  updated  parameters,  so  that  the  system 
can  come  out  of  a  local  minimum.  This  method  is  called  "Simulated  Annealing", 
and  it  will  be  described  next. 

Simulated  aimealing  is  a  process  which  is  based  on  the  concept  of  "armealing" 
as  used  in  the  metallurgical  processes.  Metallurgical  annealing  is  the  process  of 
slowly  cooling  a  physiced  system  in  order  to  obtain  states  with  globally  minimum 
energy  [2].  In  this  process,  a  sample  is  first  heated  up  sufficiently  so  as  to  melt  it 
out  completely,  and  then  slowly  cooled  back  to  a  highly  ordered  state.  This  is  a 
statistical  mechanical  phenomenon  as  explained  below. 

A  system  composed  of  a  very  large  number  of  particles  (atoms)  may  be 
described  in  terms  of  the  average  behavior  of  the  particles  constituting  the  system. 
In  statistical  mechanics,  to  describe  the  average  behavior,  each  configuration  of  a 
system  is  defined  by  the  set  of  positions  r  of  the  particles  weighted  by  a  probability 
factor  of  the  form 

€    ir 

where  k  is  called  the  "Boltzmann  constant",  E  is  the  energy  of  the  system,  and 
T  is  the  temperature.    One  property  of  such  a  system  that  is  of  interest  is  the 
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behavior  at  low  temperatures.  Usually,  the  system  approaches  its  ground  state 
as  the  temperature  is  lowered.  But  in  practice,  low  temperature  alone  is  not  a 
sufficient  condition  for  attaining  the  ground  state  of  the  system.  Experiments  show 
that  the  final  state  of  a  system  at  low  temperature  depends  on  the  rate  at  which 
the  material  is  cooled  from  its  original  molten  state.  If  the  material  is  cooled  at 
a  very  slow  rate,  taking  care  to  keep  the  material  at  equilibrium  at  every  stage  of 
the  process,  then  the  find  state  attains  an  energy  which  is  much  lower  in  energy 
scale  than  the  configuration  attained  by  a  rapid  quenching  of  the  substance.  In  the 
first  case,  we  often  achieve  a  crystalline  solid  with  the  set  of  particles  distributed 
in  a  regular  pattern,  whereas  in  the  second  case,  we  achieve  a  glassy  state  with  a 
lot  of  defects  embedded  in  it.  Metallurgical  annealing  refers  to  this  process  of  slow 
cooling  of  an  initially  molten  substance  to  take  it  to  a  highly  ordered  state. 

This   concept    of  annealing   can   be    applied   to   the   optimization  problem 
mentioned  before  in  this  chapter,  where  we  attempt  to  minimize  a  system  function 

F   that   depends   on  the  n-dimensional  parameter  vectors   F{si,S2, ,Sn)    G 

Q.  Here  the  components  of  the  parameter  vector  wiU  act  as  the  coordinates 
of  the  constituting  particles  of  a  metallurgical  substance,  and  the  starting 
random  parameters  are  analogous  to  the  random  structure  of  the  initially  molten 
metallurgical  substance.  In  the  case  of  simulated  annealing,  we  define  a  parameter 
called  the  effective  temperature,  which  plays  the  role  of  temperature  T  in  actual 
annealing.  It  plays  an  important  role  in  deciding  the  acceptance  criterion  of  the 
updated  parameters  during  the  optimization.  The  updating  scheme  is  similar  to 
the  one  in  the  second  paragraph  of  this  chapter,  for  the  improved  version  of  the 
crude  stochastic  optimization  procedure.  After  an  update  of  the  parameter-vector 

has  been  made,  the  corresponding  system  function  F{si,S2, ,5„)  is  computed, 
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and  compared  to  the  previous  value.  If  the  new  value  is  less  thein  the  old  value,  then 
the  new  parameter  vector  is  accepted  as  the  new  parameter- vector  unconditionally. 
If  the  new  value  is  more  than  the  old  value,  then  the  difference  between  the  new 
and  the  old  values  is  computed  as 

dF  =  {Fnew  -  Fold) 

and  the  updated  parameter  is  accepted  with  a  probability  of 

p{dF)  =^  expij^) 

Tliis  can  be  simvdated  on  a  computer  by  choosing  a  random  number  from  a 
uniforirdy  distributed  pseudo-random  number  generator  such  that  we  have  equal 
probabiUty  of  getting  numbers  in  the  interval  (0,  1),  and  then  comparing  this 
number  with  the  computed  probabihty  of  p(dF).  If  this  random  number  is  less 
than  p(dF),  the  new  configuration  is  accepted,  else  it  is  rejected,  and  the  system 
stays  at  its  old  state.  We  then  start  over  and  repeat  the  procedure. 

The  primary  advantage  of  this  method  of  optimization  is  that  it  has  a 
higher  probabihty  of  reaching  the  global  minimum  thtm  the  improved  stochastic 
optimization  described  in  the  second  paragraph,  because  simvdated  anneahng  does 
have  a  chance  of  coming  out  of  any  local  minima.  When  a  move  increases  the 
function  value,  the  algorithm  generates  a  finite  non-zero  probability  of  accepting 
this  move,  giving  the  system  a  chance  to  get  into  a  'higher  energy'  state,  which 
helps  the  system  to  come  out  of  a  local  minimum.  This  is  a  marked  difference 
of  this  algorithm  from  the  algorithms  that  look  for  moves  that  only  lower  the 
function  value,  which  can  very  easily  get  trapped  in  any  local  niinimuni,  as  we  saw 
in  second  paragraph.  Simulated  annealing,  thus,  accepts  moves  which  (with  certain 
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probability)  even  drive  the  system  away  from  a  minimum  parameter  configuration 
whereas  the  scheme  in  the  second  paragraph  always  tries  to  drive  the  system  towards 
a  strictly  lower  energy  configuration. 

Vanderbilt  and  Louis  (1984)  compared  the  efficiency  of  Simulated  Annealing 
with  other  conventional  non-probabilistic  optimization  techniques  [1].  They  found 
that  Simulated  Annealing  is  compeirable  to  these  approaches  as  far  as  the  number 
of  function  evaluations  and  computer  time  is  concerned.  But,  in  case  of  simple  low 
dimensional  optimization  problems,  almost  any  of  the  non-probabilistic  approaches 
has  a  liigher  cheuice  of  finding  the  global  minimum  than  the  method  of  Simulated 
Annealing.  Specially  for  problems  with  clearly  convex  cost  functions,  it  is  always 
more  efficient  to  use  standard  gradient  descent  methods.  But,  Simulated  Annealing 
can  be  very  advantageous  for  optimization  problems  having  very  many  degrees 
of  freedom,  where  exhaustive  search  for  a  global  minimum  through  deterministic 
procedures  can  be  impossible  or  computationally  prohibitive. 
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CHAPTER  IV 

LEAST  SQUARE  OPTIMIZATION 

BY 

SIMULATED  ANNEALING 

APPLICATION  TO  THE  INVERSE  PROBLEM 

In  this  chapter,  we  will  discuss  the  application  of  simulated  annealing  to  least 
square  optimization,  with  a  particular  application  to  the  "Inverse  Problem"  as 
has  been  discussed  in  Chapter  II.  We  have  seen  in  Chapter  III  that  simulated 
annealing  is  effectively  a  method  for  mutidimensional  global  optimization,  and 
hence  for  applying  this  method  to  the  inverse  problem,  we  have  to  first  frame 
it  (the  inverse  problem)  into  an  optimization  problem.  It  was  discussed  in  Chapter 
II  how  the  inverse  problem  is  equivalent  to  a  least  squeire  optimization  problem. 
We  will  discuss  now  the  method  adopted  here  for  applying  simulated  annealing  to 
this  case,  eilong  with  the  other  methods  applied  to  similar  problems. 

There  eire  analytic  ways  of  solving  least  square  problems  when  the  number  of 
dimensions  is  small,  but  the  number  of  computations  grows  exponentially  with  the 
number  of  dimensions,  making  it  computationally  prohibitive  for  many  dimensions 
[1].  Besides,  the  analytic  method  of  solving  the  least  square  minimization  problem 
involves  inversion  of  the  well  known  Hilbert  matrix,  which  is  a  well-posed  but  ill- 
conditioned  problem  for  higher  dimensions  [2].  Hence  it  is  not  a  very  practical 
method  for  problems  involving  a  large  number  of  parameters. 

Attempts  have  been  made  to  solve  the  inverse  problem  described  in  Chapter  II 

20 


by  deterniinistic  methods,  £ind  one  such  method  has  been  developed  by  P.  Li 
and  Ranun  [1].  The  problem  discussed  in  their  work  is  to  recover  undergrotmd 
layers  from  surface  data.  Their  method  is  iterative,  and  the  iteration  should  be 
started  with  some  preferred  starting  point  to  reduce  the  chances  of  attaining  a 
local  minimum. 

The  stochastic  optimization  algorithm  studied  here  is  that  of  Simulated 
annealing,  and  the  particular  inverse  problem  of  interest  here  has  been  explained 
before.  Our  objective  is  to  recover  underground  segments  from  surface  data.  For 
this  purpose,  a  signal  is  sent  below  the  grovmd  from  a  transmitter  on  the  groimd, 
and  the  reflected  signal  is  recorded  by  receivers  on  the  ground.  The  data  are 
obtained  from  these  reflected  signals.  As  we  have  already  mentioned  (in  Chapter 
II),  the  data  g{a,  b)  are  given  by  equation  (2.5),  and  hence  the  integral 


/ 


box,  \z-a\  .\z-b 


(4.1) 


is  computed  first  for  Jill  the  boxes,  i.e.,  for  i  =  1  to  25,  eind  stored  for  later  use 
in  the  computation  of  g{a,h)  cind  gi  (see  Chapter  II).  This  integral  is  computed 
by  three  dimensional  Simpson's  integration  algorithm  for  its  relative  simpUcity. 
In  the  simulation  of  the  actual  seismic  data  (as  has  been  discussed  in  Chapter 
II),  we  assmne  a  particular  weight  distribution  among  the  different  boxes  in  the 
segmented  fi  which  is  the  three  dimensional  region  of  the  Earth's  crust  considered 
for  this  problem,  and  the  values  of  g{a,b)  computed  through  equation  (2.5)  for 
different  values  of  a  and  b  which  are  the  locations  of  the  transmitter  and  the  receiver 
respectively  on  the  surface  of  the  Earth.  These  are  then  taken  as  the  actual  seismic 
data  for  the  configuration  assiuned  in  our  simulation  of  the  experiment.  We  then 
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try  to  optimize  the  "cost  function"  C  as  expressed  by  equation  (2.3).  The  simulated 
annealing  algorithm  for  this  optimization  is  described  below. 

In  the  present  case,  we  have  computed  g[a,h),  as  described  above  ,  for  11 
different  values  of  a  and  11  different  values  of  b  for  each  of  the  values  of  a.  These 
11  locations  of  the  receiver  and  the  transmitter  correspond  to  the  points  on  the 
surface  of  the  block  uniformly  spaced  at  an  interval  of  1  unit,  starting  at  the  left 
hand  corner  and  going  up  to  the  right  hand  corner.  To  start  with,  we  assume 
that  the  actual  distribution  of  the  weight  is  unknown,  and  hence  we  assign  random 
coefficients  tt;  G  (0,  2)  in  each  of  the  boxes.  This  range  of  0  to  2  for  the  initial  values 
of  coefficients  is  chosen  because  the  maximum  coefficient  chosen  in  simulating  the 
actual  values  of  g\a,h)  is  1,  and  we  tried  to  test  the  algorithm  with  a  starting 
configuration  which  does  not  assign  a  coefficient  which  is  very  different  from  the 
actual  value.  We  will  see  in  the  next  Chapter  how  the  algorithm  behaves  when  this 
condition  is  not  satisfied.  Once  the  starting  coefficients  are  assigned  randomly  to 
each  of  the  boxes  in  fi,  we  compute  C  by  the  equation 

121 

C  =  Y.^9i  -  hf  (4.2) 

»=i 

where  gi  refers  to  the  value  of  fli(a,  h)  for  the  tth  data  out  of  the  121  total  simulated 
seismic  data,  and  g^  refers  to  the  value  of  g  for  the  same  locations  of  the  receiver  and 
the  transmitter.  The  next  step  in  the  algorithm  is  to  add  (or  subtract)  a  certain 
amoimt  from  the  existing  value  of  the  weight  of  a  randomly  chosen  box.  Actually, 
this  amount  that  is  added  (or  subtracted)  has  an  upper  bound  of  0.1.  which  we  call 
the  step  size  signifying  the  fact  that  in  a  single  step,  we  cannot  change  the  value 
of  the  weight  of  any  box  by  more  than  this  value.  After  this  change  in  coeffficient  is 
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made  in  a  box,  the  new  value  of  C  is  computed  and  compared  to  the  previous  veilue. 
If  the  new  value  is  less  than  or  equal  to  the  old  value,  then  the  updated  weight 
is  accepted  for  the  corresponding  box,  otherwise  it  is  accepted  with  a  Boltzman 
probability  distribution.  For  this  purpose,  the  difference  between  the  new  and  the 
old  cost  functions  AC  is  computed  through  the  following  relation: 

AC  =  {Cner.  '  Cold)  (4.3) 

and  this  difference  is  positive  when  the  new  C  is  more  than  the  old  C.  This  update 
is  then  accepted  with  a  probability  of 


p(AC)  =  e^^  (4.4) 


where  T  is  the  effective  temperature  at  that  stage  of  the  algorithm.  The  update 
is,  therefore,  accepted  if  a  number  chosen  from  a  pseudo-random  number  generator 
(which  generates  nearly  uniformly  distributed  numbers)  is  less  than  or  equal  to 
the  above  probability.  This  is,  in  brief,  the  acceptance  criterion  of  the  updates 
for  this  simulated  annealing  algorithm.  The  same  procedttre  is  repeated  a  large 
number  of  times  at  each  temperature  before  the  temperature  is  reduced,  so  that 
the  configiuration  gets  a  chance  to  equlibrate  at  any  temperature,  which  is  importjint 
in  the  case  of  annealing,  as  has  been  said  before.  At  each  temperature,  the  fraction 
called  equilibrium  ratio,  of  the  moves  that  increase  the  cost  function  out  of  the 
total  number  of  accepted  moves  from  a  group  of  250  total  moves  is  computed,  and 
if  this  fraction  is  close  to  0.5,  it  is  assmned  that  the  configuration  has  attained 
equilibrimn  at  that  temperature,  because  there  are  as  many  accepted  moves  that 
increase  the  cost  function,  as  there  are  moves  that  reduce  it.  So  the  configuration, 
on  the  average,  would  stay  the  same  even  if  more  and  more  updates  are  attempted 
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at  that  temperature.  This  is  the  point  when  the  temperature  is  reduced  by  a 
factor  called  the  Cooling  Rate.  The  temperature  is  also  reduced  when  we  have 
already  made  4000  moves,  and  the  equilibrimn  ratio  is  still  away  from  0.5.  But,  if 
the  above  fraction  is  away  from  0.5,  and  the  total  nmnber  of  moves  made  at  that 
temperature  is  less  than  4000,  then  the  configuration  has  not  reached  equilibrium, 
and  we  attempt  more  updates  in  groups  of  250  moves,  with  the  equilibrium  ratio 
computed  after  each  group  of  250  moves,  to  determine  if  the  temperature  is  to  be 
reduced  or  not.  We  chose  equilibrium  ratio  to  be  computed  after  250  moves 
because,  there  are  25  boxes  in  total,  and  hence  on  the  average  each  box  will  have 
a  chance  of  getting  updated  10  times  in  a  block  of  250  updates.  In  practice,  the 
temperature  is  reduced  when  the  value  of  equHbrium  ratio  is  within  a  certain 
range,  called  the  ratio-tolerance,  of  0.5,  i.e.,  the  temperature  would  be  reduced  if 

I  equilibrium  ratio  —  0.5  |=  ratio  —  tolerance.  (45) 


The  above  procedure  is  repeated  at  each  temperatvire.  The  program  is 
terminated  when  the  fraction  of  the  total  nmnber  of  moves  that  are  accepted  at 
a  temperature  falls  below  a  preassigned  value  called  the  acceptance-tolerance, 
which  is  a  small  niunber  (it  is  0.01  for  our  case),  so  that  the  program  would  stop 
when  most  of  the  updates  are  not  being  accepted  any  more,  and  the  system  has 
nearly  reached  the  frozen  state. 

This  is  the  algorithm  of  the  annealing  algorithm  applied  to  the  inverse  problem 
described  in  Chapter  II. 
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CHAPTER  V 
RESULTS  AND  CONCLUSION 


In  this  chapter,  we  will  present  the  results  produced  by  the  application  of 
the  simulated  annealing  algorithm  described  in  Chapter  4  on  the  inverse  problem. 
Initially,  in  all  the  simulations,  it  is  assumed  that  the  actual  distribution  of  the 
weight  in  the  slab  of  the  Earth's  crust  described  before  consists  of  indices  of  unity 
in  the  boxes  which  are  in  the  4th  row  -  second  colvmm,  and  4th  row-4th  column, 

and  zero  everywhere  else.  This  configuration  is  shown  below: 

0  0     0  0     0 

0  0     0  0     0 

0  0     0  0     0 

0  10  10 

0  0     0  0     0 

The  data  {i?i}]it  are  generated  for  121  diiferent  combinations  of  the  receiver  and 
the  transmitter  locations  on  the  surface  of  the  Earth,  corresponding  to  11  different 
locations  of  the  transmitter  and  the  receiver,  and  the  algorithm  is  tested  on  the 
basis  of  how  closely  it  can  reproduce  this  configuration  starting  with  a  random 
weight  distribution.  The  reader  should  be  warned  that  the  fits  will  be  poor,  as 
discussed  in  the  Introduction.  We  axe  studying  the  effects  of  several  parametes  on 
this  algorithm,  and  not  trying  to  find  the  best  algorithm. 

First  of  all,  the  algorithm  was  tested  to  see  the  effect  of  round-off  error  on  the 
final  results.  These  round-off  errors  are  produced  because  of  the  finite  precision  of 
the  computers,  and  a  detailed  theoretical  treatment  of  this  particular  topic  can  l)e 
found  in  any  nvmierical  analysis  book  such  as  [  1  ].  For  our  algorithm,  we  set  the 
step  size  at  0.1,  starting  temperature  at  5,  cooling  rate  at  0.9.  number  of 
iterations  per  temperature  at  1000,  and  acceptance-tolerance  at  0.1.  These 
values  were  chosen  because  a  step  size  of  0.1  is  not  a  very  large  step  size  as  compared 
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to  the  actual  maximum  weight  in  a  box,  neither  is  it  a  very  small  value  compared 
to  the  same.  A  starting  temperature  of  5  indicated  that  initially  almost  98  percent 
of  the  total  nmnber  of  attempted  moves  were  being  accepted  indicating  that  the 
configuration  at  that  temperatiure  was  analogous  to  the  molten  state  in  an  actual 
annealing  procedure.  A  cooling  rate  of  0.9  was  adopted  because  it  was  used  in  the 
simulated  annealing  algorithm  designed  by  S.  Kirpatrick,  C.  D.  Gelatt,  Jr.,  and 
M.  P.  Vecchi  [  2  ],  and  they  foimd  it  to  be  a  reasonably  slow  cooling  rate  for  the 
annealing  algorithm.  The  acceptance- tolerance  was  chosen  to  be  0.1  because  it  was 
found  that  below  this  vedue,  the  execution  time  goes  up  considerably  without  any 
significant  reduction  in  the  final  cost  fimction.  1000  iterations  at  each  temperatiu-e 
would  give  every  box  the  chance  to  be  updated  40  times  on  the  average.  With  these 
parameter  values,  the  algoritlim  was  checked  in  both  single  and  double  precision 
modes. 

In  the  single  precision  case,  it  was  found  that  the  algorithm  produces  weight 
distributions,  whose  deviations  from  the  actual  distribution  did  not  follow  a 
statistical  behavior,  as  it  was  foimd  that  when  squared  errors  C  (the  cost  function 
for  this  problem)  from  different  trials  were  grouped  together  and  the  group  avergage 
computed,  it  did  not  go  down  as  the  square  root  of  the  number  of  groups 
considered,  which  would  be  expected  of  any  statistical  phenomenon.  In  fact  it  was 
roughly  constant.  This  suggests  the  presence  of  a  noise  term.  The  approximate 
statistical  Ijehavior,  however,  was  observed  in  the  values  of  C  obtained  in  the 
double  precision  mode.  We  also  found  that  the  average  value  of  C  produced  by 
the  single  precision  execution  of  the  annealing  algorithm  was  about  1.5  times  higher 
than  the  same  obtained  by  the  double  precision  execution  of  the  same  algorithm. 
This  strongly  hinted  at  the  significant  worsening  of  the  algoritlim  in  the  single 
precision  mode  because  of  round-ofl"  error  affecting  the  computation  of  the  ratio  of 
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^  specially  when  the  temperature  gets  small  because  then  a  small  rotmd-off  in 
T  will  be  magnified  many  times  as  tliis  quantity  appears  in  the  denominator.  The 
result  obtained  by  the  execution  of  the  algorithm  in  the  single  precision  is,  thus, 
meaningless,  as  what  we  observe  there  is  just  "noise"  because  of  round-off  and  not 
the  desired  quantity.  This  suggested  that  the  algorithm  does  not  work  in  single 
precision  (on  the  VAX)  and  must  be  executed  in  the  double  precision  mode  for  any 
further  studies.  The  results  obtained  in  the  single  and  the  double  precision  modes 
are  shown  in  Tables  1,2,  and  3. 

In  all  the  trials  above,  the  number  of  updates  at  each  temperature  was  fixed  at 
1000,  as  each  of  the  25  boxes  woidd,  on  the  average,  have  a  chance  to  get  updated  40 
times  at  each  temperature  which  was  assumed  to  be  sufficient  to  let  the  configuration 
equilibrate  at  each  temperatvu-e.  Here,  it  would  be  relevant  to  briefly  explain  the 
criterion  for  deciding  if  equilibrium  had  been  reached  at  a  temperature.  We  say  that 
equilibrium  has  been  attained  if  about  50  percent  of  the  total  number  of  accepted 
moves  at  a  temperature  increases  the  energy,  and  the  rest  decrease  it  (or  keep  it 
the  same).  This  makes  sense  because  at  this  stage,  even  if  we  keep  the  temperature 
the  same,  the  configuration  will  not  be  improved  on  the  average,  because  for  every 
move  that  decreases  the  cost  function,  there  will  be  a  move  that  will  increase  it.  We 
computed  the  ratio  called  the  equilibrium  ratio  (as  defined  earlier)  of  the  nmnber 
of  moves  decreasing  the  cost  function  to  the  total  number  of  moves  accepted  at 
each  temperature  at  the  end  of  1000  attempted  updates,  antl  it  was  found  that  this 
equilibrium  ratio  was  much  above  50  percent  at  some  of  the  temperatures,  giving 
us  the  indication  that  1000  iterations  per  temperature  is  not  always  sufficient  to 
attain  equilibrium.  At  some  of  the  temperatures,  it  was  foimd  that  as  high  as  70 
percent  of  the  total  number  of  accepted  moves  were  decreasing  the  cost  function 
when  the  temperature  was  lowered,  and  this  was  a  premature  cooling  because  we 
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TABLE    I 


Results  showing  the  non-statistical  behavior  of  the  standard 
error  for  double  precision  execution  of  the  anneaUng  algorithm 

for  step  size  of  0.1 


Number  of  groups        Standard  Error  a  Ratio  of  <r 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


0 

7.145E-07 

1.127E-06 

1.57 

1.026E-06 

0.91 

1.070E-06 

1.04 

1.084  E-06 

1.01 

1.386E-06 

1.27 

1.412E-06 

1.02 

1.332E-06 

0.94 

1.27!)E-06 

0.96 

1.221E-06 

0.95 

1.351E-06 

1.11 

TABLE    II 


Results  showing  the  statistical  behavior  of  the  standard 
error  for  double  precision  execution  of  the  anneahng  algorithm 

for  step  size  of  0.1 


Number  of  groups        Standard  Error  a  Ratio  of  a 


1 
2 
3 

4 
5 
6 

7 
8 
9 


0 

1.241E-06 

1.459E-06 

1.17 

1.435E-06 

0.96 

1.301E-06 

0.90 

1.273E-06 

.97 

1.192E-06 

0.93 

1.123E-06 

0.94 

1.075E-06 

0.95 

TABLE    III 

Results  showing  the  statistical  behavior  of  the  standard 
error  for  double  precision  execution  of  the  annealing  algorithm 

Step  size  used  =  0.05 


Number  of  groups        Standard  Error  a  Ratio  of  (t 


1 
2 
3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


0 

8.788E-07 

8.180E-07 

0.93 

7.423E-07 

0.90 

9.929E-07 

1.34 

9.348E-07 

0.94 

9.545E-07 

1.02 

9.096E-07 

0.95 

9.066E-07 

0.99 

8.632E-07 

0.95 

8.355E-07 

0.96 

8.168E-07 

0.97 

could  accept  more  moves  that  would  lower  the  cost  function  on  the  average  even  if 
we  stayed  at  that  temperature.  On  this  basis,  it  was  decided  to  change  the 
cooling  schedule  so  as  to  allow  the  configuration  at  each  temperature  to 
come  closer  to  an  equilibrium- ratio  of  0.5  before  being  cooled.  Here  the 
updates  were  implemented  in  groups  of  250  updates.  The  new  cooUng  schedule  is 
as  follows  for  each  temperature  diiring  the  annealing  process: 

1.  Liitiahze  the  variable  ngroup  to  0.  This  variable  keeps  track  of  the  number 
of  groups  attempted 

2.  ngroup  =  ngroup  +  1. 

3.  Make  a  group  of  250  updates. 

4.  If  this  is  the  first  group  of  updates  at  the  existing  temperature,  then  go 
to  step  2  (this  allows  the  configuration  to  be  closer  to  equilibrium  through  the 
first  250  updates).  If  this  is  not  the  first  group,  then  compute  the  equilibrium 
ratio. 

5.  If  the  total  nvunber  of  updates  accepted  so  far  at  this  temperature  is  less 
than  1  percent  of  the  total  number  of  attempted  updates,  or  no  moves  have 
been  accepted  from  the  last  group  of  updates,  then  stop  the  algorithm  (as  the 
configuration  is  assiuned  to  be  frozen  at  this  point).  Else  if  the  equilibrium 
ratio  hes  between  0.47  and  0.53  (it  is  assumed  that  the  configuration  is  very  close 
to  equihbrium  if  the  equilibrium  ratio  is  within  these  Umits)  then  reduce  the 
temperature  (as  the  configuration  is  in  equlibrium).  Otherwise  go  to  step  6. 

6.  If  the  number  of  groups  attempted  so  far  at  the  current  temperature  is  less 
than  16,  then  go  to  step  2.  Else  reduce  the  temperature.  (This  fixes  an  upper 
bound  for  the  ntunber  of  updates  at  any  temperature  at  4000). 

So,  we  observe  from  the  above  cooUng  schedule  that  the  niunber  of  total  updates 
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can  be  different  at  different  temperatures  depending  on  how  fast  the  configuration 
equilibrates  at  each  temperature.  The  results  obtained  by  executing  the  annealing 
algorithm  with  tliis  modified  cooling  schedule  showed  that  the  equilibrium  ratio 
was  within  the  above  mentioned  limits  at  almost  each  temperature  and  for  some  of 
the  temperatures,  this  limit  was  reached  after  only  about  500  updates,  and  therefore, 
1000  updates,  as  before,  would  be  iinnecessary  at  these  temperatures.  This  shows 
that  this  schedule  automatically  determines  the  number  of  required  updates  at  each 
temperaure,  thus  making  it  more  appropriate  for  our  purpose.  All  the  results  that 
follow  are  based  on  this  modified  cooling  schedvde. 


We  see  above  that  the  configuration  at  a  temperature  is  assumed  to  have 
attained  equilibriiun  if  the  equilibrium  ratio  is  between  0.47  and  0.53.  In  fact,  we 
use  a  parameter  called  ratio-tolerance  (see  page  24)  that  determines  how  close  to 
0.5  the  equilibrium  ratio  should  get  before  the  configuration  can  be  assumed  to 
be  in  equilibrimn,  and  equilibrium  is  assumed  if  the  equilibrium  ratio  is  within 
(0.5  -  ratio-tolerance)  and  (0.5  +  ratio-tolerance).  So,  in  the  above  schedule, 
ratio-tolerance  is  0.03.  Better  results  are  expected  for  lower  values  of  this  ratio- 
tolerance  as  that  would  mean  a  better  approximation  to  equilibrimn,  and  the 
results  are  studied  for  a  lower  and  a  higher  values  of  this  parameter,  set  at  0.015  and 
0.05  respectively.  The  results  for  these  three  different  values  of  ratio-tolerances 
are  shown  below: 


Ratio-tolerance 

C 

3st  Function 

St 

^ndard  Error 

Execution  Time 

0.015 

4.6E-06 

8.5E-07 

45  min. 

0.03 

5.8E-06 

7.1E-07 

17  min. 

0.05 

7.6E-06 

2.2E-07 

13  min. 
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We  comment  on  these  results  below. 

The  final  weight  distribution  recovered  by  this  algoritlim  for  these  different 
values  of  ratio-tolerances  are  shown  below.  The  results  here  are  averages  of  30 
runs  for  each  value  of  the  ratio- tolerance. 

Ratio-tolerance  =  0.015 

0  0  0  0  0 

0  0  0  0  0 

0.01  ±0.01     0.04  ±0.03     0.07  ±0.04  0.03  ±  0.03  0.01  ±  0.02 

0.09  ±0.08     0.14  ±0.11     0.18  ±0.11  0.15  ±0.13  0.07  ±  0.07 

0.29  ±0.17     0.29  ±0.26     0.49  ±  0.30  0.38  ±  0.24  0.30  ±  0.21 

Ratio-tolerance  =  0.03 

0  0  0  0  0 

0  0  0.01  ±0.01  0  0 

0.01  ±0.01     0.04  ±0.04     0.05  ±0.03  0.03  ±  0.02  0.01  ±  0.02 

0.11  ±0.1       0.14  ±0.11     0.16  ±0.13  0.12  ±0.11  0.08  ±  0.04 

0.23  ±0.13     0.35  ±0.26     0.48  ±  0.28  0.42  ±  0.29  0.30  ±0.19 

Ratio-tolerance  =  0.05 

0  0  0  0  0 

0  0  0.01  0  0 

0.01  ±0.02     0.03  ±0.03     0.06  ±  0.05  0.04  ±  0.03  0.03  ±  0.03 

0.1  ±0.08       0.14  ±0.11     0.13  ±0.10  0.14  ±  0. 10  0.09  ±  0.07 

0.30  ±0.20     0.38  ±0.24     0.45  ±  0.29  0.36  ±  0.20  0.28  ±  0.19 
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Comparing  the  execution  times  obtained  for  these  different  values  of  ratio- 
tolerances,  we  find  that  the  execution  time  goes  up  from  13  minutes  to  about 
17  minutes  when  the  ratio-tolerance  goes  down  from  0.05  to  0.03.  But  the  ratio- 
tolerance  of  0.05  produces  a  cost  functionwhich  is  about  13  times  the  cost  function 
for  a  ratio-tolerance  of  0.03.  Decreasing  the  ratio- tolerance  from  0.03  to  0.015 
reduces  the  cost  function  by  half  whereas  the  execution  time  goes  up  from  17 
minutes  to  45  minutes,  which  is  a  considerable  increase  in  the  execution  time.  It 
should  also  be  noted  from  the  data  presented  above  that  the  average  final  cost 
functions  corresponding  to  ratio-tolerances  of  0.015  and  0.03  overlap  and  hence 
they  are  statistically  indistinguishable,  though  the  smaller  ratio-tolerance  has  an 
execiition  time  which  is  more  than  two  times  that  for  the  bigger  ratio-tolerance. 
This  suggests  that  given  a  choice  between  the  values  of  0.015  and  0.03,  one  should  use 
the  larger  of  these  values.  But  we  also  observe  that  the  cost  functions  corresponding 
to  the  ratio- tolerances  of  0.03  and  0.05  do  not  overlap  indicating  the  superiority 
of  0.03  as  compared  to  0.05.  We,  therefore,  decided  that  a  value  of  0.03  would  be 
a  sensible  trade-off  between  execution  time  and  ratio- tolerance.  The  subsequent 
runs  are  all  for  a  ratio- tolerance  of  0.03.  We  see  here  that  this  parameter  is 
quite  important  because  if  chosen  inappropriately,  it  can  increase  the 
execution  time  by  a  great  amount  without  producing  any  significantly 
improved  results. 

The  present  annealing  algorithm  has  been  compared  with  the  zero  temperature 
case  in  which  only  those  moves  are  accepted  which  lower  the  C,  and  we  find  that 
the  results  from  the  zero  temperature  trials  are  inferior  to  those  of  the  annealing 
algorithm  both  quahtatively  and  quantitatively.  Quantitatively,  we  find  that 
the  average  cost  function  is  about  10  times  greater  for  the  case  of  zero 
temperature,    than   that    for    the   annealing   algorithm.     The  average  cost 
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function  in  the  case  of  zero  temperature  trials  is  of  the  order  of  l.OE-05  whereas 
that  for  the  annealing  algorithm  is  found  to  be  about  10  times  smaller  than  this. 
Qualitatively,  we  And  that  the  zero  temperature  case  produces  an  almost 
uuiform  average  weight  distribution  among  the  columns  (as  c£in  be  seen 
from  the  average  configiu-ation  recovered  by  the  zero  temperature  case  presented 
below),  whereas  the  annealing  algorithm  can  eliminate  the  two  bordering 
columns  which  is  a  definite  qualitative  improvement  in  the  recovery  of 
the  configuration.  But  there  is  another  aspect  of  the  zero  temperature  case  that  is 
to  be  noted  here,  and  that  is,  the  execution  time  for  the  annealing  algorithm 
is  about  ten  times  more  than  that  for  the  zero  temperature  case.  The  zero 
temperature  case  takes  about  2  minutes  to  finish  execution  whereas  the  annealing 
algorithm  takes  about  20  minutes.  Hence,  in  a  given  amount  of  time,  one  can 
execute  the  zero  temperatine  case  many  more  times  than  the  aimeaUng  algorithm. 
The  average  configuration  obtained  by  executing  the  zero  temperature  algorithm  30 
times  is  shown  below: 

Results  obtained  by  the  zero  temperature  case: 
Average  of  30  trials: 

0  0  0  0  0 

OiO.Ol  0.1  ±0.01  0.01  ±0.01  0.01  ±0.01  0.01  ±0.01 

0.04  ±0.03  0.05  ±0.03  0.05  ±  0.03  0.05  ±  0.03  0.05  ±  0.03 

0.13  ±0.05  0.12  ±0.03  0.11  ±  0.03  0.12  ±0.05  0.14  ±0.04 

0.27  ±0.07  0.26  ±0.06  0.25  ±  0.05  0.25  ±  0.06  0.27  ±0.06 

The  configuration  corresponding  to  the  lowest  cost  function  is  shown  on  the 
next  page. 
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Configuration  corresponding  to  the  lowest  cost  function: 

0  0  0  0  0 

0.04     0.06     0  0.02     0.04 

0.02     0.04     0.04     0  0.07 

0.04     0.17     0.11     0.12     0.2 
0.17     0.22     0.23     0.21     0.14 

Next,  the  effects  of  different  step  sizes  and  different  starting  configurations  are 
studied  for  tiiis  annealing  algorithm.  The  step  size  is  studied  first.  For  this  purpose, 
the  other  parameters  are  fixed  at  the  following  values: 

Starting  temperature  =  5 

Cooling  rate  =  0.9 

Ratio-tolerance  =  0.03 

Accepteince-tolerance  =  0.01 

Number  of  updates  per  block  =  250 

Maximimi  nmnber  of  updates  at  a  temperature  =  4000 

With  the  above  set  of  parameters,  the  step  sizes  are  taken  as  0.1,  0.05,  0.02, 
0.01,  and  0.005.  For  each  of  these  step  sizes,  the  annealing  algorithm  is  executed  30 
times,  and  the  standard  error  among  the  cost  functions  in  the  final  configuration 
output  by  the  annealing  algorithm  for  each  of  these  30  trials  is  computed  by  grouping 
the  cost  functions  into  groups  containing  5  error  elements,  to  get  a  total  of  6  groups 
to  compute  the  standard  error  from.  The  cost  function  for  each  step  size  along 
with  the  standard  error  in  these  cost  functions  are  shown  below: 
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Step  size 

Cost  Function 

Standard  Error 

0.005 

6.0E-06 

3.7E-07 

0.01 

5.6E-06 

6.3E-07 

0.02 

4.5E-06 

7.2E-07 

0.05 

4.3E-06 

6.7E-07 

0.10 

5.8E-06 

7.1E-07 

One  would  guess  that  a  smaller  step  size  would  produce  a  better  recovery,  and 
hence  yield  a  lower  cost  function,  but  the  data  above  do  not  clearly  show  that.  This 
could  be  due  to  the  fact  that,  as  the  step  size  gets  smaller,  the  value  of  AC  (as 
described  in  Chapters  before)  also  becomes  smaller  and  hence  gets  affected  more  by 
the  round-off  error  arising  because  of  the  finite  precision  of  the  machine.  The  weight 
distribution  recovered  by  this  algorithm  for  each  of  these  step  sizes  are  shown  below: 
Step  size  =0.05 

0  0  0  0  0 

0  0  0.01  ±0.01  0  0 

0.01  ±0.02  0.05  ±0.03  0.06  ±  0.04  0.04  ±  0.04  0.01  ±  0.02 

0.10  ±0.09  0.14  ±0.10  0.18  ±0.10  0.13  ±  0.08  0.09  ±  0.07 

0.41  ±0.24  0.27  ±0.20  0.39  ±  0.18  0.31  ±  0.20  0.36  ±  0.36 


Conflgnration  corresponding  to  the  lowest  cost  funrtion: 

0  0  0  0  0 

0  0  0  0  0 

0  0.07  0.07  0.02  0 

0.11  0.07  0.32  0.03  0.03 

0.22  0.37  0.41  0.5,'3  0.39 


35 


step  size  =  0.02 

0  0  0  0  0 

0  0  0.02  ±0.01  0  0 

0  0.02  ±0.02  0.07  ±0.05  0.07  ±  0.04  0.02  ±  0.02 

0.04  ±0.06     0.05  ±0.04  0.11  ±  0.07  0.13  ±  0.09  0.08  ±0.08 

0.64  ±0.19     0.65  ±0.27  0.20  ±  0.13  0.20  ±  0.12  0.40  ±  0.20 

Configuration  corresponding  to  the  lowest  cost  function: 

0  0  0  0  0 

0  0.0  0  0  0 

0  0.04  0.15  0.05  0.02 

0  0  0.31  0.07  0.19 

0.53  0.83  0.04  0.02  0 

Step  size  =  0.01 


0 

0 

0 

0 

0 

0 

0 

0.02 

0 

0 

0  0.01  ±0.01     0.06  ±0.02     0.06  ±0.03     0.02  ±  0.02 

0.03  ±0.05     0.03  ±0.02     0.09  ±  0.04     0.12  ±  0.07     0.06  ±  0.04 
0.74  ±0.13     0.76  ±0.19     0.13  ±  0.06     0.14  ±0.05     0.56  ±  0.15 


Configuration  corresponding  to  the  lowest  cost  function: 

0  0  0  0  0 

0  0  0.01  0  0 

0  0.01  0.1  0.05  0 

0  0.02  0.05  0.34  0.04 

0.76  0.75  0.12  0.11  0.41 
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step  size  =  0.005 

0                     0 

0 

0 

0 

0                     0 

0.03 

0 

0 

0  0  0.06  0.05  ±0.01     0.03  ±0.02 

0.03  ±0.04    0.02  ±0.01     0.09  ±  0.02     0.11  ±  0.04     0.05  ±  0.03 
0.72  ±0.09     0.83  ±0.08     0.15  ±0.02     0.17  ±0.03     0.53  ±  0.09 

Configuration  corresponding  to  the  lowest  cost  function: 

0  0  0  0  0 

0  0  0.02  0  0 

0  0  0.1  0.05  0.01 

0.07  0  0.09  0.08  0.05 

0.55  0.9  0.19  0.21  0.54 

Looking  at  the  cost  functions  for  the  different  step  sizes,  we  observe  that  it 
is  ahiiost  impossible  to  prefer  a  step  size  based  on  these  results  because  the  cost 
functions  for  these  step  sizes  overlap,  indicating  that  one  cannot  be  shown  to  be 
clearly  superior  to  another.  But  we  have  chosen  0.05  to  be  the  step  size  for  this 
algorithm  considering  the  trade  off  between  execution  time  and  final  configuration. 
Looking  at  the  weight  distribution  recovered  by  the  algorithm  for  the  different  step 
sizes  shown  above,  we  find  that  we  get  a  qualitatively  improved  result  when  the  step 
size  is  smaller,  as  is  shown  by  the  lower  and  lower  weights  for  top  layers  (where  the 
actual  weights  are  zeroes)  for  smaller  and  smaller  step  sizes. 

We  also  plotted  the  graph  of  acceptance  vs.  temperature  for  all  of  these  step 
sizes.  These  graphs  are  produced  on  the  next  page.  It  is  found  that  as  the  step  size 
becomes  smaller,  the  acceptance  values  show  a  more  pronounced  rlpviation  from 
their  monotonically  decreasing  behavior  Ijy  showing  an  increase  in  acceptance  at 
intermediate  temperature  range  before  decreasing  monotonically  again,  as  can  be 
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seen  in  the  graphs  for  step  sizes  of  0.005,  0.01,  and  0.02.  This  behavior  may  occur 
beacuse  the  average  change  in  the  cost  function  (AC)  computed  at  each  temperature 
does  not  decrease  at  the  same  rate  as  the  temerature. 

It  was  then  hypothesized  that  similar  acceptance  characteristic  could  be 
observed  if  the  step  size  is  fixed  at  some  value  and  the  initial  starting  configuration 
is  varied  over  a  range,  because  starting  with  a  small  step  size  is  quite  equivalent  to 
having  a  larger  step  size  but  also  a  larger  deviation  from  the  actual  configuration. 
Tliis  wotdd  show  the  correlation  between  step  size  and  the  starting  configuration. 
It  was  decided  to  start  with  a  uniform  starting  configuration  with  the  same  value 
in  each  of  the  boxes.  The  results  are  shown  for  a  step  size  of  0.05  (fixed)  and  the 
different  starting  configurations  as  shown  below. 

Weight  per  box  Cost  Function  Standard  error 

0.00  4.002E-06  3.520E-07 

0.04  4.331E-06  5.893E-07 

0.08  3.792E-06  6.906E-07 

0.8  4.897E-06  5.239E-07 

1.6  4.087E-06  3.433E-07 

4.0  4.571E-06  1.082E-06 

8.0  4.745E-06  1.364E-06 

As  can  be  seen  from  the  above  list  of  cost  functions,  the  standard  error  becomes 
comparable  to  the  data  when  the  weight  per  box  is  4.0  or  above  (which  is  a  total 
starting  value  of  50  times  (or  more)  the  actual  total  value).  This  means  that  the 
present  algorithm  does  not  produce  reliable  results  when  we  start  with  a  total 
amount  of  the  weight  which  is  more  than  50  times  the  actual  total  value  present 
inside  the  volume.  It  was  also  found  that  the  algorithm  almost  never  required  more 
than  4000  (the  preset  upper  bound)  updates  before  equilibrating.   At  most  of  the 
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temperatures,  it  equilibrated  well  below  400  updates. 

The  average  weight  distribution  recovered  by  this  algorithm  for  the  different 
uniform  starting  configurations  mentioned  above  are  shown  below: 

Initial  total  weight  of  0.00  distributed  uniformly  among  the  25  boxes: 

0  0  0  0  0 

0  0  0  0  0 

OiO.Ol  0.05  ±0.03  0.06  ±0.04  0.05  ±  0.04  0.01  ±  0.01 

0.11  ±0.08  0.16  ±0.11  0.16  ±0.11  0.20  ±0.11  0.10  ±0.09 

0.31  ±0.19  0.35  ±0.27  0.34  ±0.20  0.32  ±  0.18  0.26  ±  0.20 

Configuration  corresponding  to  the  lowest  cost  function: 

0  0  0  0  0 

0  0  0  0  0 

0.01  0.06  0.09  0.04  0 

0.03  0.07  0.23  0.13  0.11 

0.4  0.26  0.67  0.36  0.13 

Initial  total  weight  of  2.00  distributed  uniformly  among  the  25  boxes: 
0  0  0  0  0 

0  0  0.01  0  0 

0.02  ±0.02  0.05  ±0.03  0.06  ±  0.04  0.03  ±  0.03  0.02  ±  0.03 

0.09  ±0.08  0.15  ±0.09  0.19  ±0.10  0.17  ±0.11  0.10  ±  0.09 

0.28  ±0.18  0.35  ±0.20  0.36  ±  0.23  0.36  ±  0.23  0.25  ±0.17 

Configuration  corresponding  to  the  lowest  cost  function: 

0  0  0  0  0 

0  0  0  0  0 

0.04  0.01  0.12  0.02  0 

0.03  0.15  0.18  0.14  0.07 

0.19  0.48  0.25  0.72  0.18 
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IniHnl  total  weight  of  20.00  distribtited  uniformly  among  the  25  boxes: 
0  0  0  0  0 

0  0  0.01  ±0.01     0  0 

0.02  ±0.03  0.04  ±0.03  0.06  ±  0.05  0.05  ±  0.05  0.01  ±  0.02 
0.08  ±0.08  0.14  ±0.10  0.18  ±0.13  0.12  ±  0.10  0.15  ±0.11 
0.38  ±0.25     0.35  ±0.23     0.36  ±  0.23     0.29  ±  0.19     0.28  ±  0.21 

Configuration  corresponding  to  the  lowest  cost  function: 

0  0  0  0  0 

0  0  0  0  0 

0.03  0.01  0.15  0.06  0 

0.11  0.05  0.23  0.1  0 

0.18  0.61  0.13  0.55  0.37 

Initial  total  weight  of  40.00  distributed  uniformly  among  the  25  boxes: 
0  0  0  0  0 

0  0  0.01  0  0 

0.02  ±0.02  0.03  ±0.03  0.07  ±0.07  0.04  ±  0.03  0.01  ±  0.02 
0.10  ±0.08  0.13  ±0.10  0.16  ±0.11  0.17  ±0.13  0.10  ±  0.08 
0.35  ±0.13     0.35  ±0.18     0.42  ±  0.20     0.27  ±  0.20     0.32  ±  0.23 

Configuration  corresponding  to  the  lowest  cost  function: 

0  0  0  0  0 

0  0  0  0  0 

0.05  0.02  0.09  0.08  0.02 

0  0.06  0.25  0.15  0.09 

0.27  0.61  0.53  0.26  0.06 
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Initial  total  weight  of  100.00  distributed  uniformly  among  the  25  boxes: 
0  0  0  0  0 

0  0  0.01  ±0.01  0  0 

0.02  ±0.03  0.04  ±0.03  0.07  ±0.05  0.05  ±  0.04  0.02  ±  0.02 

0.07  ±0.09  0.12  ±0.09  0.15  ±0.11  0.15  ±0.12  0.08  ±0.07 

0.39  ±0.31  0.42  ±0.35  0.34  ±  0.21  0.31  ±  0.31  0.35  ±0.27 

Configuration  corresponding  to  the  lowest  cost  function: 

0  0  0  0  0 

0  0  0  0  0 

0.01  0.05  0.02  0.03  0 

0.01  0.14  0.41  0.19  0.15 

0.11  0.62  0.27  0.38  0.05 

Initial  total  weight  of  200.00  distributed  uniformly  among  the  25  boxes: 
0  0  0  0  0 

0  0  0.01  0  0 

0.01  ±0.02  0.04  ±0.05  0.07  ±  0.04  0.04  ±  0.03  0.01  ±  0.02 

0.09  ±0.10  0.11  ±0.10  0.16  ±0.11  0.12  ±0.10  0.07  ±0.07 

0.35  ±0.37  0.42  ±0.39  0.33  ±  0.26  0.3  ±  0.335  0.41  ±  0.35 

Configuration  corresponding  to  the  lowest  cost  function: 

0  0  0  0  0 

0  0  0.03  0.01  0 

0  0.04  0.01  0.04  0 

0  0  0.13  0.23  0.17 

0.31  0.05  0.27  0.29  0.17 

The  graphs  of  acceptence  vs.   temperature  show  that  the  accepteiice  deviates 
from  its  monotonicaUy  decreasing  behavior  (as  it  did  for  the  lower  step  sizes  before) 
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for  larger  deviations  in  the  total  starting  weight.  This  shows  that  a  similar  effect  can 
be  produced  by  varying  the  step  size  only  or  by  varying  the  starting  configuration 
only,  Mid  hence  these  two  parameters  are  closely  related  to  each  other. 

Another  interesting  aspect  observed  by  comparing  the  final  distribution  of 
weights  obtained  by  the  aimealing  algorithm  starting  with  a  configuration  skewed  to 
the  left  and  a  luiiform  starting  configuration  is  that  the  skewed  starting  configuration 
also  produces  a  skewed  final  average  configuration,  whereas  a  uniform  starting 
configuration  does  not  show  einy  average  skewed  behavior  in  the  final  configuration. 
Tliis  effect  depends  on  the  choice  of  the  step  size.  The  effect  is  very  pronounced  for 
large  step  sizes,  eind  it  diminishes  as  the  step  size  gets  smaller.  The  reason  behind 
this  behavior  is  not  yet  fully  understood  and  has  not  been  investigated  in  this  thesis. 

Application  of  the  algorithm  on  other  configurations 

We  have  so  iar  been  applying  the  annealing  algorithm  to  test  its  efficiency  to 

recover  an  actual  weight  distribution  where  the  weights  were  unity  only  in  the  fourth 

layer  from  top  and  in  the  second  and  fourth  columns,  with  zero  everywhere  else. 

Now,  we  apply  the  algorithm  on  other  configurations  as  shown  below. 

Configuration  1: 

0     10     0     0 

0  0  0  0  0 

0  0  0  0  0 

0  0  0  0  0 

0  0  0  0  0 


42 


Average  configuration  recovered  by  tlie  algoritiim: 

(Average  of  30  trials) 

0  0.99  0  0  0 

0        0  0  0  0 

0       0  0  0  0 

0       0  0  0  0 

0       0  0  0  0 


Configuration  2: 


0  0     0  0  0 

0  10  0  0 

0  0     0  0  0 

0  0     0  0  0 

0  0     0  0  0 


Average  configuration  recovered  by  tlie  algoritlini: 

(Average  of  30  trials) 

0  0.02  0  0  0 

0.09  ±0.03  0.55  ±0.07  0.04  ±0.02     0  0 

0.15  ±0.11  0.27  ±0.12  0  0  0 

0±0.02  0±0.01  0  0  0 

0  0  0  0  0 

Configuration  corresponding  to  tlie  lowest  cost  function: 

0  0  0  0  0 

0  0.72  0.01  0  0 

0.32  0.05  0.01  0  0 

0  0  0  0  0 

0  0  0  0  0 
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Configuration  3: 


0  0     0  0  0 

0  0     0  0  0 

0  10  0  0 

0  0     0  0  0 

0  0     0  0  0 


Average  conflguration  recovered  by  the  algorithm: 

(Average  of  30  trials) 

0  0  0  0  0 

OiO.Ol  0.08  ±0.02  0.03  ±0.01  0  0 

0±0.08  0.18  ±0.09  0.04  ±0.02  0  0 

0.19  ±0.14  0.14  ±0.09  0.06  ±0.04  0  0 

0.22  ±0.18  0.17  ±0.12  0.04  ±0.04  0  0 

Configuration  corresponding  to  the  lowest  cost  function: 

0  0  0  0  0 

0  0.1  0.01  0  0 

0.09  0.29  0.05  0  0 

0.17  0.16  0.04  0  0 

0.21  0.03  0.04  0  0 


Configuration  4: 


0  0     0  0  0 

0  0     0  0  0 

0  0     0  0  0 

0  10  0  0 

0  0     0  0  0 
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Average  configuration  recovered  by  the  algorithm: 

(Average  of  30  trials) 

0  0  0  0  0 

0  0  0  0  0 

0.05  0.04  0.01  0  0 

0.13  0.12  0.10  0.01  0 

0.29  0.20  0.12  0.06  0.01 

Configuration  corresponding  to  the  lowest  cost  function: 

0  0  0  0  0 

0  0.01  0  0  0 

0  0.02  0.01  0  0 

0.09  0.3  0.03  0  0 

0.4  0.17  0.14  0  0 

It  is  seen  from  the  results  with  the  above  configurations,  that  the  algorithm 
reproduces  the  configuration  1  quite  accurately. 

For  configuration  2,  the  algorithm  fails  to  produce  the  exact  configuration 
(unlike  the  previous  case)  but  it  produces  a  higher  weight  in  the  proper  colunuv  than 
the  other  coltunns.  In  this  case  the  algorithm  cannot  produce  the  right  box,  but 
definitely  identifies  the  right  column.  Besides  if  we  look  at  the  result  corresponding 
to  the  lowest  cost  function  we  find  that  it  gives  a  very  good  estimate  of  the  exact 
location  of  the  weight. 

For  configuration  3,  the  algorithin  fails  to  determine  even  the  correct  column  in 
most  of  the  cases,  but  it  does  so  in  the  case  having  the  lowest  cost  fuiirtion.  Here, 
the  algorithm  is  only  capable  of  showing  that  most  of  the  weight  is  skewed  to  the 
left  side  of  the  volmne,  and  towards  the  bottom  layers.   It  is,  therefore,  seen  that 
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the  present  algorithm  is  capable  of  recovering  the  weights  for  the  two  topmost  rows, 
but  does  not  do  so  for  the  layers  deeper  down.  But,  it  can  produce  liigher  weights 
in  the  proper  columns  than  in  the  other  ones,  even  for  the  weights  in  the  deeper 
layers. 

One  interesting  aspect  to  be  noted  from  the  results  of  this  set  of  observations 
is  that  the  algorithm  in  all  the  above  three  cases  gives  an  approximate  estimate  of 
the  total  amount  of  weight  present  in  the  volume  within  an  accuracy  of  about  25 
percent.  This  surely  depends  on  the  a  priori  assmnption  that  the  weights  cannot 
be  negative. 

Further  work  in  this  area  can  be  carried  out  to  modify  the  algorithm  so  as  to 
recover  the  correct  weights  even  for  the  deeper  layers  more  efficiently.  The  most 
important  modifications  are: 

1.  The  step  size  can  be  varied  during  the  algorithm,  making  it  gradually  smaller 
during  the  algorithm,  so  as  to  make  the  adjustments  in  the  weight  distribution  finer 
as  the  configuration  comes  closer  and  closer  to  the  frozen  state. 

2.  Besides,  the  updating  scheme  of  the  weight  at  any  temperature  can  be 
modified  to  make  more  sophisticated  weight  transfers  among  the  diff'erent  boxes  to 
drive  the  configuration  towards  the  desired  state  more  efficiently.  For  example,  the 
transferring  scheme  can  be  modified  to  incorporatr.  wriglit  rf^distribution  bptwppn 
three  boxes,  or  weight  transfer  from  one  box  to  another,  instead  of  only  adding  (or 
subtracting)  weights  to  a  box,  as  has  been  done  in  the  current  algorithm. 
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APPENDIX 


oro^ram    gener  a  te_<j  a  ta 

program  to  compute  the  three  dimensional  integral   over  the 
a  predefined  box,  uiith  Simpson's  rule. 

external  tunc 
open(unit=7,status='old',form='unformatted') 

ujrite(6,*)'  type  the  x-coordina tes  of  the  end  points  of  slab: 
ii)riteC6.*)  *  Ctype  the  louier  value  first)' 

read(5,*)xlimitlou)fXlimithi 
iiiriteC7)xlimitlou).xlimithi 

uiriteC6,*)'  type  the  y-coordinates  of  the  end  points  of  slab: 
'Jirite(6,!;')  '  Ctype  the  loujer  value  first) 

read(5i*)ylimitlou,ylimithi 
ujriteC7)yliii>itloiii,ylimithi 

ujrite(6,«)'  type  the  z-coordina tes  of  the  end  points  of  slab: 
uirite(6,*)'  Ctype  the  loiuer  value  first)' 


K, 


readC5»*)2limitlou),zlimitni 
uiriteC7)zlimitloui,zlimithi 

u!riteC6,#)'    type    the    number    of    points    for    the    x-integra tion :  ' 
iuriteC6.*) '    Cvalue    must    be    an    even    integer)' 


readC5,«)nx 

u)riteC6,*)'  type  the  number  of  points  for  the  y-integr ation :  ' 
uriteC6,*)'  Cvalue  must  be  an  even  integer)' 

r eadC  5  ,*)ny 

uiriteC6,*)'    type    the    number    of    points    for    the    z-int  egration :  ' 
ujriteC6i«)'    Cvalue    must    be    an    even    integer)' 


readC5,*)nz 
UjriteC7)nx  ,ny  ,nz 


uiriteC6,*)'    type    the    number    of    boxes    in    the    x,y,z    direction; 

readC5,*)nxboxjnybox,nzbox 
«iriteC7)nxtox,nybox,nzbox 


compute    the    box    size    on    each    axis: 

xeps  =  Cxlimithi  -  xlimit  loui)/f  loat  Cnxbox  ) 
yeps  =  Cylimithi  -  ylimit lotti)/f loatCnybox) 
zeps  =  Czlimithi  -  zlimitloui)/f  loat  Cnzbox ) 
iiiriteC7)xeps,yeps,zeps 

vary    "a"    and    "b": 

do  50  iax  =  1  •  1 
do  48  ibx  -  1,11 
ueight  =  !• 
xa  =  floatCiax  -  1) 


xb  =  floatdbx  -  1) 

ya  =  0. 

yb  =  0. 

za   =  0. 

zb  =  0. 
i))rit6C7)xaiya.za,xb,yb,zb 

iupit«C7)iiieisht 

do  300  iz  =  l.nzbox 
do  200  iy  =  linybox 

do  100  ix  =  IfOxbox 


c  compute  the  end  points  of  the  box: 

^  xl  =  floatCix  -1)  »  xeps   +  xlimitlou 

x2  =  xl  +  xeps 

yl  =  floatCiy  -n  *  yeps  *    ylimitloui 
y2  =  yl  +  yeps 
21  =  floatCiz  -1)  *  zeps   +  zlimitlour 

call  sLp3or.Cxa.ya.za.xb,yb.zb.xl.x2,yl.y2.zl.z2.nx,ny. 

^  nzivalue) 

uirite(7)ixiiytiz. value 
100  continue 

200       continue 
300    continue 

43     continue 
50     continue 

stop 

end 


c 


subroutine  sinp3di.Cxa. ya . za .xb ,y b ,zb . xl , x2 , yl .y2 ,zl , z2.nx. ny , 
^  nz  t  sum) 

coirpute  the  step  size  for  integration: 

hx  =  (x2-xl)/nx 
hy  =  (y2-yl)/ny 
hz  =  (z2-zl)/nz 


sum 


0.0 


do  300  ix  =  1.  nx  ♦  1 

do  200  iy  =  1.  ny  +  1 

if  (ix  .eq.  1)   30  to  1 

if  (ix  .eq.  (nx  +  D)  go  to  1 

ixl  =  ix/2 

ix2  =  ixl  *  2 

ix3  =  ix  -  ix2 

if  (ix3  .eq.  0)  30  to  2 

factorx  =  2. 

go  to  3 

1  factorx  =  1. 
go  to  3 

2  factorx  =  *. 


if  ciy  .aq*  i>  90  ^°   ^ 

if  (iy  .eq.  Cny  ♦  1))  90  to  5 

iyl  =  iy/2 

iy2  =  iyl  *  2 

iy3  =  iy  -  iy2 

if  (iy3  ••q.  0)  go  to  6 

factory  =  2. 


go  to  7 
5  factory  =  1. 

go  to  7 
j  factory  =  *. 
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vx  =  xl  ♦  floatCix  -  l)*hx 
vy  =  yl  +  floatCiy  -  l)*hy 


:u:i:it  :  ;:.init  ^CfuncCvx.vy.zl.xa.ya.za.xb.yb.zb) 
^  ♦  func(vx,vy,z2,xa,ya,za,xb,yb,zb)) 

^  «  factorx  «  factory 

vz  =  zl  ♦  hz 

suminit  =  suminit  ♦  '••0 
^  *  func(vx,vy,vz,xa,ya.za,xb,yb,zb) 

^  #  factorx  f    factory 

SUT.Z   =   0. 

do  100  iz  =  3,  (nz-1).  2 
vz  =  zl  ♦  floatCiz  -  l)«hz 

V  *  func<vx,vy,vz,xa,ya,za,xb,ybizb) 

suirz    =    sumz    ♦    2.    *    v 
vz    =    zl    ♦    fioat(iz)*hz 

V  =  funcCvx,vy,vr,xa.ya,za,xb,yb,zb) 

suirz  =  sumz  ♦  4 .  >:»  v 
100  continu* 

suirz  =  sumz  *  factorx  «  factory 

SUIT  =  sum  ♦  suminit  ♦  sumz 
200         continue 
300    continue 

sum  =  sum  *  hx  *  hy  *  hz  /  C3.  #  3.  *  3.) 

e       call  lib$shoiu_timer 

return 
end 

function   funcCvx,vy,vz,xa,ya,za,xb,yb,zb) 


denoml 


Cxa  -  vx)*«2  ♦  (ya  -  vy)**2  ♦  (za  -  vz)**2 
denoml  =  sqrtCdenoml)  u,1ft*2 

denom2  =  Cxb  -  vx)«*2  ♦  <yb  -  vy)**2  *  Czb    vz)**2 
denom2  =  sqrtCdenom2) 
func  =  1. /(denoml  «  denom2) 
return 
end 


<^ 


c 
C 
c 

c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 


prograir  anneal 

this  cooa    reads  the  values  of  initial  seeds  for  the  random  number 
generator  from  the  file  for008.dat, 

implicit  real«8  Ca-h,o-u  » ui-z) 

real*<>  valcomb 

real^'i    rannyu 

diirension    valu€Cllill,25,25)iSuniinitCll,ll)fSumtrial(ll»ll) 

dimension  sumneiii(lltU)i«eisht(25i2'.),iieightn9iiiC25.24)iiran(4) 

dimension  iuiC25)fValcombC11.11.5|5).sumt(ll»ll) 

dimension  tryitO)  »  iranl  (4)  i  ixbxC  25  )  t  i2bx(25  ) 

value  =  the  values  of  integral  for  each  box  read  from  a  file  that  uias 

computed  before* 
suminit  =  the  integral  over  all  the  boxes  for  a  receiver  and  transmitt* 

location. 
sumtrial  =  the  integral  over  all  the  boxes  computea  after  the  the  boxes 

are  assigned  random  weights, 
sumneuj  =  the  integral  over  all  the  boxes  computed  after  updating  the 

random  weights  of  the  boxes, 
valcomb  =  the  precomputed  three  dimensional  integral  for  the  kernel 

computea  over  each  of  the  boxes  by  three  dimensional 

Simpson's  rule  by  the  FORTRAN  program  called  GENERATE  as 

presented  before. 
weight  =  random  weights  assigned  to  the  boxes. 
uieightneiu  =  the  updated  random  lueight. 
iran  =  the  seeds  for  the  random  number  generator. 

data  if ive/5/,i2ero/0/ 
data  accept_tolerance/0. 01/ 

open  the  necessary  files: 

open(unit  =  7,  f ile  = 'f inal.accept .out  '  f  status  ='ne»i') 

the  final  configuration  is  written  into  unit  7. 
open(unit  =  17,file='val.dat',status= 'old ',forni= 'unformatted') 

unit  17  contains  the  precomputed  values  of  "valcomb". 
open(unit=13» status='ola') 

unit  18  contains  the  seeds  for  the  random  number  generator. 
open(unit  =  19,  file  = 'config  .  dat '. status  «'old')  ' 

unit  19  contains  the  uieights  from  previous  runs. 

the  final  seeds  for  the  setrn  are  written  in  unit  18. 

rewindCunit= 18) 

read(18,*)iran 

readC5,*)stepft,fac,ilp2 

tstart  =  t 

u/riteC7,«)  'iran  ='f  iran 

step  =  the  step  size  for  computing  the  new  random  weights. 


^ 


1020 
1025 
1030 


1035 

c 
c 

c 


34 
36 
38 
40 

c 

45 


171 

c 
c 
c 
c 
c 


urite  out  the  necessary  input  parameters  for  check  up: 

u)rite(6 »«)  '  iran=',  iran 
ujriteC6.1020) 

format (3x, 'Accept_tolerance') 
u)riteC6fl025)accept_tolerance 
formatClx,fl4.5,5x,fl4.5) 
ii;rite(6,1030) 

formatClx  , 'Starting  Step ',  2x  , 'Starting  teiiip'>2xi 
^  'Teiiip_fac',5xt 'Niter '; 

ii)rite(6,1035)step,  tf  fac,  ilp2 
formatClx,fl0.5,5x,fl0.5,5x,f7.4,3x,i8) 


60 
65 

c 
c 
c 


74 

75 

c 

c 
c 
c 


read 

and 

do  4 

do  3 

do  3 

do  3 

read 

cont 

cont 

cont 

cont 


the 
tr  an 
0  ix 
8  ix 
6  ix 
4  iz 
(17) 
inue 
inue 
inue 
inue 


initial  values  of  each  big  box  for  each  combination  of  receivei 
smitter . 
b  =  1,11 
8  =  1 , ixb 
count  =  1,5 
count  =  1,5 
valcombCixa,ixb,ixcount,izcount> 


do  45  i  =  1,4 
iranlCi)  =  iran<i) 
call  se trnCiranl ) 

call  second(tl) 

icount  =  1 

continue 

uirite(6,*)'  iran  =',  irsnl 


initialize  timer 

read  initial  weights  (random)  to  the  boxes: 

r  e  ui  i  n  d  (  u  n  i  t  =  1 9  ) 

do   65  ixoox  =  1,5 

do   60  izbcx  =  1,5 

readdS,*)  ueight  (ixbox  ,  izbox  ) 

continue 

compute  suminit: 

do  75  ixa  =  1,11 

do  74  ixb  =  1X3,11 

suminit(ixa,ixb)  =  1.*  valcoirb(ixa  ,  ixb  ,  2  ,4)  + 
*  l.«  valcomb(ixa  ,ixb  ,4  ,4) 

continue 
continue 

u)rite(6,*)'  finished  computing  all  suminits:' 

initialise  the  sumtrials: 

do  SO  ixa  -  1,11 
do  79  ixb  =  1,11 


79 
80 


82 

83 
84 
85 

c 


83 
90 


95 
c 


sumtrialCixat ixb)  =  0. 
continue 

do  85  ixb  =  lill 
do  84  ixa  =  Ifixb 

do  83  izz  =  1.5 

iz  =  6  -  izz 

do  82  ix  >  1|5 

sumtrialCixa.ixb)  =  suirtrislCixa,  ixb)  ♦  .     -^ 

^  i)i«ight(ix.iz)  *  walcombC  ixa  .  ixb  .  ix  i  iz  J 

continu* 
continue 
continue 

compute  the  initial  error: 
errorold  =  0. 
do  90  ixb  =  1  ill 
do  89  ixa  =  l.ixb 

error  =  ( suminit Cixa, ixb)  -  sumtrialCixa.ixb)) 
errorold  =  errorold  +  2.  »  error'?*2 
continue 

do  95  ixa  =  1.11  ^^ 

errorsub  =  C  summit  (  ixa  .  ixa )  -  sumtriaKlxa,  ixa  J  ; 
errorold  =  errorold  -  errorsub*«2 

t  =  t/fac 

call  secondCtll) 

do  145  iteirp  =  1,999999 

reduce  temperature 

t  =  t  «  fac 

if  Ct  .le.  l.d-24)  go  to  150 


V 


104 
105 


106 


ireject  =  0 
iaccept  =  0 

iup  =  0 

enter  the  loop 

do  135  ilocp2  =  l.ilp2 

add/subtract  from  a  random  box: 

i    =    rannyu(O)    ^    S .    *    I 

j    =    rannyuCO)    *    5.    ♦    1 

lueightneuiCi.j)    =    uieightCi.j)    +    (rannyuCO)    -    0.5)    «    st«p 

if    C    uieightnewCi.  j)    -Qe.    0.)    then 

iflag  =1 

change  =  lueightneuiC  i.  j)   -  uieightCi.j) 

errorneui  =  0. 

do  105  ixb  =  1.11 

do  104  ixa  =  l.ixb 

sumnewCixa.ixb)  =  sumtrialCixa.ixb)  ♦change* 
«  valcombCixa. ixb  .i. j) 

»rr    -    suminitCixa.ixb)  -  sumnewC ixa.ixb) 

errorneui  =  errorneui  +  2.*  err<'*2 

continue 

do  106  ixa  =  1.11 

err  =  sumini t Ci xa  .  ixa)  -  sumneuiCixa  .ixa) 

errorneui  =  errorneui  -  err«*2 


"w 


133 
13* 

135 


else 

reject  the  move: 

iflas  =  * 
end  if 

check  for  acceptance  or  rejectio 


n  of  the  moves! 


if  C  iflag  .sq.  *  ^  ^*^"^ 

iflag.reject  =1 
else  If  Cerrorneui  .It.  errorold  )  then 
iflag.raject  =  0 


else 


proD  =  dexp  ((errorold 
X  =  rannyu(O) 


errornew)  /  ^^ 


if  (x  .It.  prob)  then 
iflag.reject  =  0 

lup  =  lup  ♦  1 

else 

iflag_reject  =  1 

end  if 
end  if 
take  actions  according  to  the  if lag.re ject: 

if  (  iflag. reject  .eq.  1)  then 
ireject  =  ireject  ♦  1 

else 

iaccept  =  iaccept  +1   , .   .. 
u,eight(i,j)  =  uieightneuid,  j) 
errorold  =  errorneui 
do  134  ixa  »  1»11 
do  133  ixb  =  ixa, 11 
sumtriaKixa.ixb)  =  sumn.-iCxxa,  ixb) 

continue 
end  if 
""^'tance  =  dfloat( iaccept )/df loat ( iaccep 


t  +  ireject) 


accep 
1001   f ormat(2e20.9) 

c 


if  (acceptance  .1..  accept  tolerance)  go  to  150 
ratio  =  dfloat(lup)/dfloat(iaccept)  acceptanc 

print*,'  t,  ratio,  acceptance-  ,  t,  raxio. 


145 

c 

150 

c 


154 
155 


continue 
continue 


do  155  iz  =  1,5 

do  154  ix  =  1,5 

iui(ix)  =  u.eight(ix,iz)  «  100. 

uirite(6,1000)iZ,CiUi(i)«l  =  l»5) 

ii)rite(6  »*)  ' 


1000   forfflat(5x,i2,3x,5i4) 
uirite(6,*)'  t=',t 

call  socond(t22) 

t33  =  (t22  -  tll)/60. 


159 
160 


162 

163 

164 
165 

c 


169 
170 


175 


179 

lao 

c 
c 
c 
c 


sumtrialCixa.ixb)  + 

lueight  <ix  ,  iz  )  *  valcombCixa  f  ixbi  ix,  iz) 


compute  the  actual  error: 
initialise  the  sumtrials: 

do  160  ixa  =  1,11 

do  159  ixb  =  1.11 

sumtrialCixa,ixb)  =  0. 

continue 

do  165  ixb  =  1.11 
do  164  ixa  -  l.ixb 
do  163  izz  =  1,5 
iz  =  6  -  izz 
do  162  ix  =  1.5 
suirtrialCixa.ixb) 
+ 

continue 
continue 
continue 

erroract  =  0. 

do  170  ixb  =  1.11 

do  169  ixa  =  l.ixb 

error  *  ( suminitC ixa , ixb)  -  sum tr ial (ix a  ,  ix b)  ) 

erroract  =  erroract  *    2.    ^    error««2 

continue 

do  175  ixa  "    1,11 

errorsub  =  (  suminit  Cixa  ,  ixa)  -  sumtnaK  ixa  .  ixa)  ) 

erroract  =  erroract  -  errorsub«*2 

u)rite(6.*)'  actual  error  ='. erroract 

errorold  =  erroract 

u/rite  the  final  configuration  for  the  future  computation  of  the 
average  configuration  of  all  the  runs: 

do  180  iz  =  1.  5 

do  179  ix  »  1.  5 

ujrite(7,*)iiieight(ix,iz) 

continue  , 

uirite(7,*)'  


uirite  the  last  random  seed 

r  e  ui  i  n  d  C  u  n  i  t  =  1 8  ) 
call  saverrCiranl ) 
uirite(ia.*)iranl 


icount  =  icount  ■••  1 

if  (icount  .It.  4)  then 

t  «  tstart 

go  to  171 
end  if 

call  second(t2) 

uirite(6.«)  'time  (in  minutes): 


■.(t2-tl)/60. 


Stop 
•nd 
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ABSTRACT 


There  are  two  main  classes  of  optimization  techniques  -  deterministic 
optimization  and  stochastic  optimization.  For  many  dimensional  optimization 
tasks,  the  deterministic  techniques  become  computationally  very  cimibersome,  and 
some  of  these  techniques  can  only  produce  a  locally  extremal  state.  Stochastic 
optimization  techniques,  on  the  other  hand,  are  computationally  more  efficient 
for  multidimensional  optimization  tasks.  This  thesis  studies  one  such  stochastic 
optimization  algorithm  called  "Simulated  Annealing" which  has  the  ability  to  come 
out  of  a  locally  extremal  state  even  if  it  happens  to  reach  one.  This  algorithm  has 
been  studied  here  for  least  squares  optimization  with  a  particular  application  to  an 
"Inverse  Problem". 
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